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Abstract 

We provide a method that enables the simple calculation of the maximal cor- 



. relation coefficient of a bivariate distribution, under suitable conditions. In 

particular, the method readily applies to known results on order statistics and 
records. As an application we provide a new characterization of the exponen- 
tial distribution: Under a splitting model on iid observations, it is the (unique, 
\ up to a location transformation) parent distribution that maximizes the cor- 

CN ' relation coefficient between the records among two different branches of the 

splitting sequence. 
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1 Introduction 

As is well-known, the Pearson correlation coefficient of the random variables X and 
Y is denned as 

p(X,F) = Corr(X,F)^ ^^Y) 



v /Var(X) v /Var(F)' 

provided that < Var(X) < oo and < Var(F) < oo. It assumes values in the 
interval [—1, 1] and it is a measure of linear dependence of X and Y . Although 
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p(X, Y) — for independent X and Y, the converse is not true. Gebelein (1941) 
introduced the maximal correlation coefficient, 

R(X,Y)= sup Cow( gi (X),g 2 (Y)), 

where the supremum is taken over all Borel functions g\ : R — > R and g 2 '■ R — > R 
with < Vargri(X) < oo and < V&rg 2 (Y) < oo. In contrast to p(X,Y), R(X,Y) 
is defined whenever both X and Y are non-degenerate, assumes values in the interval 
[0, 1] and vanishes if and only if X and Y are independent. The maximal correlation 
coefficient plays a fundamental role in various areas of statistics; e.g., it is useful 
in obtaining optimal transformations for regression, Breiman and Friedman (1985), 
and it has applications in the convergence theory of Gibbs sampling algorithms, Liu 
et al. (1994). 

However, despite its usefulness, it is often difficult to calculate the maximal 
correlation coefficient in an explicit form, except in some rare cases. A well-known 
exception is the result of Gebelein (1941) and Lancaster (1957) who showed the 
interesting property that if (X, Y) is bivariate normal then 

R(X,Y) = \Cott(X,Y)\. (1) 

Another exception is provided by the surprising result of Dembo et al. (2001), and 
its subsequent extensions given by Bryc et al. (2005) and Yu (2008). In its general 
form the result states that for any iid non-degenerate random variables Xi, . . . , X n , 

R(X 1 + ... + X m , X k+1 + -.. + X n )= m ~ fc l<k + l<m<n. 

ym(n — k) 

Finally, we mention an important result of Szekely and Mori (1985), who showed, 
using Jacobi polynomials, that if (X, Y) follows a bivariate density of the form 

f(x,y) = ^tfn^jl ^'Hy ~ xf-\l - yV-\ $<x<y<\ (2) 

J v ; r(a)r(£)r(7) v ; v ; v ; 

(where the parameters a, (3, j are positive), then 

fl( x,y) = Corr(x,y) = y =i=Z P) 

Observe that for any integers 1 < i < j < n, the density of the pair of order statistics 
(U i:n , Uj. n ), based on n iid rv's observations from the standard uniform distribution, 
is of the form (EJ) (with a = i, (3 = j — i, 7 = n+l—j). Actually, ([3]) extends Terrell's 
(1983) characterization of rectangular distributions through maximal correlation of 
an ordered pair. 

In this article we provide a unified method for obtaining the maximal correlation 
when the bivariate distribution has a particular structure (diagonal structure - see 
next section). The method is very simple (e.g., it readily applies to verify (CQ) and 
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03])) and it does not require knowledge of particular sets of orthogonal polynomials. 
As notable examples, some known related characterizations of specific distributions 
through maximal correlation of ordered data and records are presented in Section 
|3j Finally, in Section H] we consider a splitting model based on iid observations. 
Applying this method it is shown that the records among two different branches 
of the splitting sequence are maximally correlated if and only if the population 
distribution is exponential (up to a location transformation) - this fact extends a 
characterization of Nevzorov (1992). 

2 The maximal correlation coefficient of bivariate distributions having 
diagonal structure 

Let (X, Y) be an arbitrary random vector with distribution function F(x, y) and 
assume that both X and Y are non-degenerate. We say that F (or the vector 
(X,Y)) has diagonal structure if the following three conditions are satisfied. 

Al. We assume that both X and Y have all their moments finite: 

E|X| n < oo and E|Y| n < oo for n = 1, 2, . . . . (4) 

It is known that, under (j3J, there exists a (unique) orthonormal polynomial 
system (OPS) {(j) n (x) = Pn% n + Pol n _i(x), p n > 0,n — 0,1,...}, corresponding 
to X, and a (unique) OPS {ip n (y) = q n y n + Pol n _i(y), q n > 0,n = 0,1,...}, 
corresponding to Y, where <fro{ x ) = ^oiv) = 1 an d Polfc(i) denotes an arbitrary 
polynomial in t of degree less than or equal to k, that may changes from line to line. 
The orthonormality of the above OPS's means, as usual, that 

E[0 n (X)0 fc (X)] = E[^ n (Y)MY)} = 5 nk , k, n = 0, 1, . . . , 

where 5 n k is Kronecker's 5. It should be noted that the OPS for X reduces to a 
finite set, say {4> n (x)}^ =Q , if and only if the support of X is concentrated on a finite 
subset of R having N + 1 > 2 points; the same is true for the OPS of Y. 

A2. We assume that the OPS {0 n (x)}^L o is complete in L 2 (X), the Hilbert space 
of all Borel functions g : R — > R with Varg(X) < oo (note that two functions gi, gi 
are considered as "equal" if P[<7x(X) = g^{X)\ = 1). Similarly, we assume that the 
system {i>n(y)}™=o is complete in L 2 {Y). 

A3. We assume that the random vector (X, Y) has the polynomial regression prop- 
erty, that is, 

E(X n |Y) = A n Y n +Po\ n -i(Y), n = l,2,..., 
E(y n |X) = B n X n + Pol n _i(X), n = l,2,..., 

where A n ,B n 6 R. 

The assumptions Al and A2 are not restrictive since, e.g., they are satisfied 
whenever both X and Y have finite moment generating functions in a neighborhood 
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of 0; see, e.g., Koudou (1998) and Afendras et al. (2011). However, this is not 
the case for assumption A3, since it applies to very particular distributions, as the 
following lemma shows. 

Lemma 2.1. Using the above notation and assuming A1-A3 we have that for all 
n,ke {1,2,...}, 

E[<j) n (X)MY)} = Wn, (5) 
where 5 n k is Kronecker's delta and p n = E[4> n (X)i/; n (Y)] G [—1, 1]. 

Proof: Since <j> n (X) and i^ n {Y) axe standardized random variables, we have p n = 
Corr [(f) n (X), i/} n (Y)] and, therefore, p n G [—1, 1]. Now, if 1 < k < n then A3 yields 

E[<f) n (X)MY)} = E{MX)nMY)\X}} = E[0 n (X)Pol fc (X)] = 0, 

because (f) n is orthogonal to any polynomial of degree at most n — 1. Similar argu- 
ments apply to the case 1 < n < k, and the proof is complete. □ 

The bivariate distributions satisfying §5§ are sometimes called Lancaster distri- 
butions and the correlations p n form a Lancaster sequence with respect to X and 
Y; see Lancaster (1969); cf., e.g., Koudou (1996, 1998). Therefore, by Lemma [27T1 
we see that assumption A3 forces a distribution to be a Lancaster one. Under cer- 
tain conditions, the density of a Lancaster distribution (if exists) has the formal 
representation (diagonal structure) 

f(x,V) = fx(x)f Y (y) ^ + jr^Pn<f>n(x)i>n(y) \ , 

where fx and /y are the marginal densities of X and Y. 

Another useful observation is the following: If the assumptions A1-A3 are satis- 
fied then we can calculate each p n , and this calculation does not require any knowl- 
edge of the polynomial systems {4>n{.x)} c ^ =Q and {4 , n (y)}'^ = o- Indeed, we have the 
following 

Lemma 2.2. Using the above notation and assuming A1-A3 we have that for all 
nG{l,2,...}, 

A n B n > 0, p n = sign(A n )\/A n B n and \p n \ = \/ A n B n . (6) 
Proof: Since <f) n {X) = p n X n + Pol„_i(X) and ip n {Y) = q n Y n + Pol n _i(F) we have 

Pn = e{MyMMx)\y)} = v{MY)[PnHx n \Y) + Poi n _i(y)]} 

= Pn E[iP n (Y)E(X n \Y)} + = p n E{i/j n (Y)[A n Y n + Pol n _ x (Y)]} 
= p n A n E[MY)Y n ] + = p n A n E{MY)qn 1 [MY) - Pol^iQO]} 
= Efi^Ef^fy)] - 0= 

This shows that p n and A n have the same sign. Using the same arguments (condi- 
tioning on X) it follows that p n = 3n&.- thus, p n = A n B n , and the proof is complete. 
□ 

We are now in a position to state and prove our main result. 
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Theorem 2.1. If the assumptions A1-A3 are satisfied then 

R(X, Y) = sup \p n \ = sup a/ A n B n . 



n>l 



n>l 



(7) 



Moreover, if |p„| < \p no \ for all n > 1, n ^ n , then for any g\ G £ 2 (A) with 
Vargi(X) > and for any g 2 G L 2 (Y) with Xaxg 2 (Y) > we have the inequality 

Corr[^(X)^ 2 (F)]<|p no |, 

with equality if and only if g±(x) = ao + Q>i<f> no ( x ) an< ^ 9i{y) — bo + biip no (y) for some 
constants ao, &o, ai, b\ G R with ai&isign(A no ) > 0. 

Proof: Let g\ G L 2 (X). By the completeness of {0 n }^L o it follows that g\ admits 
the representation 

oo „ 

9i( x ) = ^a„0 n (x), where a n = E[g l (X)<l) n (X)] = / g 1 (x)<f) n (x)dF x {x). 



Here Fx is the marginal distribution of X, {a n }™ =0 are the Fourier coefficients with 
respect to the OPS {0 n }^L o , and the series converges in the L 2 (X)-sense, i.e., 



limE 

N 



N 



9l( X ) - ^«„0n(A) 



?1=0 



(8) 



In particular, a = E[gi(A)], and the above limit is usually written as Parseval's 
identity, 



Var^X) = J> 2 , 



n=l 



because it is easily seen that 



E 



N 



gi (X)-J2^nMX) 



n=0 



N 



= Va^ipO-^o*. 



n=l 



Therefore, the assumption Vargi(X) > implies that a n ^ for at least one n > 1. 
Similarly, for any g 2 G £ 2 (X) we have 

oo „ 

Var^ 2 (F) = V/3 2 , where /3 n = E[g 2 (Y)^ n (Y)] = / g 2 (y)My)dF Y (y). 

n=l jR 

Here Fy is the marginal distribution of F, {/3„}^L are the Fourier coefficients with 
respect to the OPS {ip n }™ =0 and, as before, 



limE 

N 



N 



g 2 {Y)-Y J Mn{Y) 



71=0 



N 



= Var( ?2 (F)-lim^/3 2 = 0. 



(9) 



n=l 



Using the above we can show that 

E[ gi {X)MY)} = anPn and E[g 2 {Y)<f) n {X)] = (3 n p n , n = 1, 2, . . . . (10) 
Indeed, for any N > n we have 



E[ gi {X)^ n {Y)\ = E 



N 



9l (x)-J2®kMx) 



k=0 



N 



k=0 



Now since N > n, ^(x) = 1, E[i/> n (Y)] = 0, E[^(y)] = 1 an d E[(j> k (X)^ n (Y)] = 
Sknpn for k > 1, we conclude, in view of (jSJ) and by using the Cauchy-Schwarz 
inequality, that 



0<(E[( ?1 (X)^ t (r)]-a n p n ) = 

TV 



E 



A' 



ft (X)-5]a^(X) 



fc=0 



< E 



fc=0 



E[^(F)] ->. 0, as X ->■ oo ; 



therefore, since {Ei\gx{X)ij) n {Y)\ — a n p n ) does not depend on iV, we conclude the 
first identity in (|10p . while the second one follows by the same arguments. Using 
ffTUj) it is easily seen that 



E 



9i 



(X)-^a n 0„(X)j (g 2 (Y)-J2Mn(Y)) 

n=0 / \ n=0 / 

AT 

= Cov[flri(X),5f 2 (y)] - ^ pnUnPn 



n=l 



thus, squaring the above identity and applying the Cauchy-Schwarz inequality in the 
resulting squared expectation we conclude, in view of (JSJ) and that 



Cov^ipf),^^)] = ^PnOCnf^n- 



:ir. 



n=l 



Therefore, combining the above we get the expression 



CoTT[ 9l (X),g 2 (Y)} 



(12) 
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Now observe that, in view of (HI 



;cov[^(x)^ 2 (y)])' 



n=l 
oo 



,n=l 



^(VW\\ a n\)(V\(^\\Pn\) 



< 



< 



v n=l 

oo 

E 

v n=l 



Pn\0i 2 n 



£|Ai|0 



v n=l 



sup|p n | (sup|p„ 

(3*) (£*)(£") ■ 



71=1 



The above inequality, combined with (1121) . shows that 

_R(AT, K) < sup \p n \ = R, say. 



n>l 



On the other hand, for any e > we can find an index uq such that |p no | > R — e, 
and thus, | Corr [0 no (X), i[) no (Y)] \ = \p na \ > -R — e. Therefore, 

R(X,Y) = snpCoiT[ gi (X),g 2 (Y)] 

> max{Corr[0 no (X),^ o (F)], Corr [-(f> no (X), ^ no {Y))} 
= max{p no , -p no } = \p no \ > R-e. 

Since e > is arbitrary it follows that R(X, Y) > R, and thus, R(X, Y) = R. 
Finally, it is obvious that if the sequence {|Pn|}^Li has a unique maximum, say \p no \, 
then, working as above, it is easily seen that 

(Cov[ gi (X),g 2 (Y)}) 2 <p 2 no [fX] (f>*) = rf l0 Vaxg 1 {X)Vaxg 2 {Y), 



v n=l 



v n=l 



with equality if and only if ct„ = /3 n = for all n > 1, n 7^ Uq] this, combined with 
the fact that p no {= Corr[0 no (X),-?/> no (F)]) has the sign of A no , completes the proof. 
□ 



3 Examples providing known characterizations via maximal correlation 

The following known results are immediate applications of Theorem 12.11 

The bivariate normal case. Assumptions A1-A3 are easily checked for the bivariate 
normal. Indeed, if (X, Y) is bivariate normal with E(A) = p±, E(Y) = p 2 , Var (X) = 
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aj > 0, Var(y) = o\ > and Corr(X, Y) = p G [-1, 1] then it is well-known that 
(X\Y = y) rsj NQjh + p^(y - fi 2 ), (1 - p 2 )°l); this means that 

(X\Y = y)=p 1 + p-(y - + criVl-P 2 Z, 

0-2 

where Z ~ N(0, 1) and = denotes equality in distribution. Therefore, 

E[X n \Y = y} = E[/i! + p-(y- /i 2 ) + <r X y/l=?Z\ n = p n ^y n + Pol n ^(y), 

<J 2 &2 

that is, 

E[X"|F] = A n Y n + Pol n _i(F), where A r , 
Similarly, 

E[y n |X] = B n X n + Pol n _i(X), where B, 

Thus, A3 is satisfied, while Al and A2 are obvious. It follows from (JS]) that \p n \ = 
VA^ = \p\ n , p n = sign(p")|p|" = p n , and, by ©, R(X,Y) = sup^ \ Pn \ = 
max„>i \p\ n = \p\] moreover, if < \p\ < 1, the equality in the inequality 

\Corr[ 9l (X), g 2 (Y))\ < \p\ 

holds if and only if both gi and g 2 are linear. On the other hand it is worth to note 
that pip takes here the form (cf. Afendras et al. (2011)) 

00 n n tl 

Cov[ 9l (X),g 2 (Y)] = Y, P -^n9 { i\x)Mgf\Y)i (13) 

n=l U - 

provided that g u g 2 G C°°,gi(X) G L 2 (X),g 2 (Y) G L 2 (Y), and that E|^ n) (X)| < 00 
and E|^ n) (F)| < 00 for all n, where gf^ denotes the n-th derivative of g i} % = 1, 2. 
Of course, one can apply (fT3"|) to the case X = Y; then = \i 2 — p, say, p = 1, 
°"i — °"2 — cTj sa y, an d (O yields the generalized Stein identity for the iV(/j, a 2 ) 
distribution: 

Cov[^(X),^ 2 (X)] = E[^(X)]E[#(X)]. 

n=l 

Characterization of rectangular distributions via maximal correlation of order 
statistics. Terrell (1983), using Legendre polynomials, proved that if X 1:2 < X 2:2 are 
the order statistics from two iid observations from a distribution with finite variance 
then 

Corr(X 1:2 ,A 2:2 )<i, 



= p n ^, n = l,2,... . 
= p n % n=l,2,... . 



8 



and the equality characterizes the rectangular (uniform over some non-degenerate 
finite interval) distributions. However, Theorem 12.11 applies immediately here. In- 
deed, if U(a,b) denotes the uniform distribution over (a, b) and if U\,U 2 ~ U(0, 1) 
then it is obvious that the order statistics Ui :2 < U 2:2 satisfy the following: 

u 1:2 \u 2:2 ~ w(o, u 2:2 ) e[u?. 2 \u 2:2 \ = f U2 ' 2 t n -^dt = — j--£/£ 2 , 

Jo U 2 -.2 n + 1 

u 2:2 \u 1:2 ~u(u 1:2 ,i) => E[uz 2 \u 1:2 ] = / r - 1 dt 

JU 1:2 1 ~~ ^1:2 

j l + f/ 1:2 + --- + t/™ 2 ). 



n+1 

Thus, A n = S„ = ^ and |p n | = Therefore, max„>i \p n \ = \p x \ = ~. It follows 
from Theorem 12.11 that Corr [<?(C/i :2 ), <7 {U 2 ; 2 )) < |> with equality if and only if g is 
linear. Since for the order statistics X 1:2 < X 2:2 from an arbitrary distribution F it 
is true that 

(Xi ; 2,X 2:2 ) = (g(U v . 2 ),g(U 2:2 )), where </(u) = inf{x : F(x) > u}, < u < 1, 

(the above g is usually denoted as -F -1 ), Terrell's result follows. The above argument 
can be easily extended to provide the characterization of Szekely and Mori (1985), 
who showed, using Jacobi polynomials, that for any integers 1 < i < j < n, 



Corr (Xj :n , Xj- n ) < 



'i(n + 1 — j) 



with equality if and only if the random sample arizes from a rectangular distribution. 
Indeed, setting g(u) = F~ l (u) = inf{x : F(x) > u}, < u < 1, where F is the 
common distribution function of the iid rv's X x , . . . , X n , we have 

(X i:n ,X jm ) = (g(Ui m ),g(U jm )) and, thus, Corr (X i:n , X j:n ) = Corr (g(U i:n ), g(U j:n )), 

which is well defined whenever < VarXj : „ + VarX, :n < 00. Since for any s G (0, 1), 

(Ui; n \Uj; n = s) = Ui-j-i, where C4 m is the z-th order statistic from a sample of size 
m from U(0, s), we have 

U i:j ^ = sU i:j ^ => E[U* n \U j:n = s] = E[(sU i:j ^) k ] = s^U^}. 



Now calculate 



B(k + i,j-i) (k + i- l)!(j - 1)! 



B(i,j-i) (k+j-iy.(i-iy: 

Jj:n\Ui:n = t) = Uj- im - h where Uj-i; 

th order statistic from a sample of size n — i from U(t, 1). Clearly, if U ~ U(t, 1) then 



Also, for any t G (0, 1) we have (Uj- n \Ui- n = t) — Uj^i :n -i, where L^-_i :n _j is the (j—i) 



9 



U = t+{l-t)U where U ~ W(0, 1). Therefore, ([/ i:n |C/ i:n = t) = M-(l and 
since = 1 - U n+1 - j:n -i, we get (£/ i:n |l/" i:n = £) = 1 - ?7 n+1 _ J -^ l _ i + tU n+ i- j:n -i. 



Therefore, 

r * lrr - +1 - E[l - U " ' iA ' 



EfC/j.JC/j^ = t] = E[l — U n+ \-j. n -i + tU n+ i-j: n -i\ 



(n + k — — j)! 
Thus, we found that assumption A3 is satisfied with 

_ (fc + t-l)!Q--l)! _ \i]k 
k (k + j-l)\(i-l)\ [jW 

where [a]k = ot{a + 1) • • • (a + k — 1), and 

= (n + fc - - z)! _ [n + 1 - j] fc 
(n + A; — i)!(n — j)! [n + l—i) k ' 

Hence, 

n 2 a R [»]fc[w + 1 ~ j]fc 

This is a strictly decreasing sequence in k, and Theorem 12.11 yields 



Corr(X i:n ,X j:n ) < Wpf 



'z(n + 1 — j) 
j(n+ 



with equality if and only if g(u)(— F l (u)) = au + (3 for some a > and (3 G R, 
i.e., X ~ U(P,P + a), a > 0. 

The same simple arguments apply to the case where (X, Y) has a density as in 
(J2J). Then, it is easily seen that for any fixed x and 7/ in (0, 1), 

(X\Y = y) — yB aj p and (Y\X = x) — x + (1 — x)Bp tl = 1 — S 7i/ 3 + xB lt p, 

where B r>s denotes a Beta random variable with parameters r > and s > 0. It 
follows that 

E(X n |F) = A n Y n and E(F n |X) = B n X n + Pol n _i(X) 

with 
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Since p\ = A n B n = r^M^fe^i is a strictly decreasing function in n, Theorem 12.11 
yields R{X, Y) = \pi\ = pi = p{X, Y), which is ©. 

Nevzorov's characterization of exponential distribution. Nevzorov (1992) proved 
that for any n,m G {1,2,...}, 



/ Ti 

Corr(R n ,R n+m ) < J — ■ , 

V n + m 

where Ri is the z-th (upper) record from a continuous distribution F with finite 
variance (here R\ = X± is the first observed random variable in the iid sequence 
{Xi}^). Moreover, equality characterizes the location-scale family of the standard 
exponential distribution. Theorem 12.11 gives the result immediately. Indeed, if W% 
denotes the i-th record from £xp(l) (with density f(x) = e~ x , x > 0) then 

(W n , W n+m ) = (Ex + ■ ■ ■ + E n , Ex + ■ ■ ■ + E n+m ), n,me {1, 2, . . .}, 

where {Ei}°Z ± is an iid sequence from £xp(l) - see, e.g., Arnold et al. (1998). Setting 
X = Ei + ■ ■ ■ + E n and Y — Ex + h E n+m , the joint density of (X, Y) is 

fx,Y( x > v) = TV I ^ n ~\y - x) m ~ l e~ y , o<x< y <oo, 

1 (n)l (m) 
and the conditional densities are 

fx\v(x\y) = ^irl ^y ~ ^) m " 1 !/" (n+m " 1) i x e (0,2/), 
1 m l (in) 



and 



It follows that 



and 



fv\x(y\x) = -J—(y - x) m ~ V^, y e (x, oo). 
1 (in) 



(k + n + m — l)\(n — 1)! 



E < y 'l A - = ^ + rRgO) r < ! + 



m)x k \ 



(k + n- l)\(n + m- 1)! 

Thus, A3 is satisfied with = — — — and — 1, so that 

(k + n + in — l)\(n — 1)\ 

2 A tj (k + n — l)\(n + m — 1)\ [n]k 



Pt = A k B k 



(k + n + in — l)\(n — 1)! [n + m]k 
Since this is a strictly decreasing sequence in k, Theorem 12.11 yields the inequality 



Coir(R nj R n+m ) = Corr (g(W n ), g(W n+m )) < J p\ = J U . 

v y n + m 
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where g(u) = .F _1 (l — e~ u ), u > 0. The equality holds if and only if g is increasing 
and linear, that is, if and only if F is the distribution of aE + where a > 0, G R 
and E ~ £xp(l). 

Lopez-Blazquez and Castano-Martmez result on maximal correlation of order 
statistics from a finite population. Let U[ N J < U^J < ■ ■ ■ < U^n be the order 
statistics corresponding to a simple random sample, ll[ N \ . . . , [/„ , taken without 
replacement from the finite ordered population ILv = {1,2,..., N}, where 2 < n < 
N. Since P(C/£? = k) = (£) Q _1 fior fc G {*, i + 1, . . . , JV - (n - *)} (and 
otherwise), and this defines a probability mass function with support := 
{z, z + 1, . . . , iV — (n — i)}, we conclude the identity 

N 'J^ i} (k -l\(N- k\ ( N\ . . 



k=i 



Setting [a] m = a (a + 1) • • • (a + m — 1) (with [a]o = 1 for all o G R), we can derive, 
with the help of (JHJ), a simple expression for the ascending moments of U-^}: 



E{[C ) k} = [^ + l] m J^, m = l,2,.... (15) 
We also mention the following obvious relations, holding for all 1 < i < j < n: 

(C^O = (iV+l-^Un^ + l-CU), (16) 

(U^\U^ =s) ±U^, se{j,j + l,...,N-(n-j)}, (17) 

(Cl^^) = k + U^l, ke{i,i + l,...,N-(n-i)}. (18) 



Now, by (115i) and (JTTJ) we get 



Jm 



^i* =- s HWmg. m = l,2,... . (19) 



Let {X,Y) = (U^\u] N n ] ). Relation (EEE} shows that 



E([x] m |y) = ff [r] m = ff r m + Poi^Y), m = 1, 2, . . . , 

and this implies, using induction on m, that 

E(X m |F) = i^r m + Pol m _ 1 (r), m=l,2,.... (20) 
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Similarly, setting %' = n + 1 — j, f = n + 1 — i (so that 1 < i' < f < n), writing Ui' 
instead of U-, N J , Uj> instead of Ujf!^, and applying relations ( 1T61) and ( |T9l) . we get 

E([Y] m \X = k) = E{[N + l-Ui>] m \U r = N + l-k} 

= E{(-l) m [^,] m + Pol^^OI^' = iV + 1 - A;} 

= (-l) m E {[C/^ m |[/,v = N + 1 - k} + Pol m _!(iV + 1 - k) 

= (-l) m [N + 1 - k]J^ + Polm-xfi) 

[J Jro 

[J Jm [71+1 2j m 

It follows that E([Y] m \X) = gQk[X] m + Pol^pf) = \0^X m + Pol^X) 
and, finally, using induction on m, we get the expression 

E(Y m \X) = ^ + \-^ x m + PoWpf), m = 1, 2, . . . . (21) 
[n + 1 - %\ m 

Clearly, (1201) and (12ip show that A3 is satisfied for (X, Y). Moreover, we have found 
that A m = jjp and B m = |"^j^] m (both do not dependent on N). Hence, since 
p 2 m = A m B m is a strictly decreasing sequence in m, we obtain from Theorem 12. II the 
inequality 

Corrb 1 ( C e , ),*(C ) )] < JA- / <(B+1 - J ' ) 



j(n + 1 -i) 

in which the equality holds if and only if both gi and gi are (non-constant and) 
linear and with the same monotonicity - more precisely, the restriction of g\ in 
the set A^ has to be non-constant and linear and the restriction of gi in the set 
A^J has to be non-constant and linear and with the same monotonicity as g\\ note 

that both sets and A^ N J contain at least two points if and only if N > n + 1. 
Lemma 2.1 of Balakrishnan et al. (2003) asserts that for the non- decreasing function 
g : {1, 2, . . . , N} — >■ {xi < x 2 < ■ ■ ■ < x^} := Hat with g(i) — Xi, i — 1, 2, . . . , N, it 
is true that 

(J9(U£?),9(U}$)) = (Xi-,,,X j:n ), l<i<j<n, 

where X\. n < X 2;n < • • • < X nm are the order statistics corresponding^ to a simple 
random sample drawn (without replacement) from the finite population 11^. Suppose 
that Con (Xi- n ,Xj. n ) is well-defined or, equivalently, that the elements of 11^ satisfy 
Xi < and Xj < XN-( n -j) (otherwise, at least one of X i: „,X, :n would be 

degenerate). Then we conclude that 



Corr(X t:ra ,X J:n ) < J %{ \ \\ J ) \<i<j<n<N, (22) 
V J{ n + 1 -V 

and the equality (for fixed i, j, n, N) characterizes those finite populations 11^ 
for which the sets {xi, Xi + i, . . . ,XN-(n-i)} and {xj, Xj+i, . . . ,XN-(n-j)} (that may or 
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may not have common points) consist of consecutive terms of two (possibly differ- 
ent) strictly increasing arithmetic progresses. That is, a population of size N with 
elements x\ < x 2 < ■ ■ ■ < xn satisfying Xi < XN-( n -i) and Xj < xjf-(n-j) attaints 
the equality in f j22|) if and only there exist constants a\ > 0, &i G R, a 2 > and 
62 G R such that 

{ciik + bi, for k — i, i + 1, . . . , N — (n — i), 
a 2 k + b 2 , for k = j,j + 1, ... ,N - (n-j), 
arbitrary, otherwise. 

Lopez-Blazquez and Castano-Martmez (2006), using Hahn polynomials, have ob- 
tained a corresponding inequality for the correlation ratio, which implies inequality 
fl22|) ; their arguments, however, apply to populations 11^ having N distinct ele- 
ments. We also refer to Theorem 2.1 and Corollary 2.1 in Castano-Martmez et al. 
(2007), noting that the characterization result stated in Corollary 2.1 of this article 
is incomplete, unless the sets and A^J have at least two common points, i.e., 
N > n + {j - i) + 1. 



4 Records from a splitting model and a Nevzorov-type characterization 
of the exponential distribution 

Assume that in a particular country and for a specific athletic event, the consecutive 
performances of the athletes are described by an iid sequence {Xj}"^. Here and 
elsewhere in this section, the common distribution of each Xj will be assumed to 
be continuous, i.e., with no atoms - absolute continuity is not needed. As the 
time goes on, the common practice is that some data regarding the sequence of 
national records, i.e., the sequence {Ri}^, are saved (and recorded), in contrast to 
the original performances of the athletes, Xj, which are usually lost or forgotten. 
The above considerations give rise to the classical record model (based on an iid 
sequence), which is well-developed in the literature; see, e.g., Arnold et al. (1998). 
Under this classical model the observed sequence {i?j}™ =1 of the first n upper national 
records is defined as R\ = X\ and Ri = X T ^, i = 2, . . . , n, where T{i) = min{j G 
{1.2....}: .V ; - /,', ,), 

Suppose now that, after the appearance of the n-th national record, the initial 
country is divided into (say) two new countries (branches), and assume that the ath- 
letes in each country are of the same strength as they was before the division. Then, 
the subsequent national records in each branch will take under account the cur- 
rent (common) national record, R n , and the subsequent sequence of their individual 
records will be of the form (R' n+ni , R'n +m ), with ni,n 2 G {1, 2, . . .}. Clearly, 

-^n+ni = -Rra+rai &nd R n+rl2 = -R n + n2 (23) 

where R n+m is the (n + m)-th record from the initial sequence, but as ri\ and n 2 
become large, the random variables R' n+m and -R," +n2 should tend to be independent. 
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Thus, the actual definition of the splitting record sequence is equivalent to the 
following model: Let {X\, X[, X", X 2 , X' 2l . . .} be an iid sequence of random 
variables. Define the n-th upper record R n as before (based on the X's), then 
set R' n = R" n := R n and T'{n) = T"{n) := T(n), and for i — 1,2,... define the 
subsequent record times and record values by 

T'{n + i) = mm{j G {1, 2, . . .} : jfj > R' n+i = X' T , {n+i)1 and 



T»(n + i) = minO G {1,2,...} : X/ > = *T»(n+i)- 



Clearly, it is of some interest to study the correlation behavior of the marginal 
records under this model, since large correlation among these variables entails good 
prediction of the one branch to the other. It is not surprising that, similarly to the 
classic case, the splitting record sequence satisfies several interesting properties. In 
particular, in the sequel we shall make use of the following lemma, the proof of which 
is simple and is left to the reader - cf. Arnold et al. (1998). 

Lemma 4.1. (a) If {(W^ +ni , W^ +n2 )}'^ > n =1 is the splitting record sequence based on 
the iid sequence {Ei, E[, E'l}°l l from the standard exponential distribution, £xp(l), 
then for each ni, n 2 G {1, 2, . . .}, 

<fj = (Ei + -- ■ + E n + E[ + . ■ -+E' nv E 1+ . ■ ■+E n +E'{+. • .+<). (24) 

(b) Let {{R' n+ni , R'n+n 2 )}'ni n 2 =i t> e the splitting record sequence based on an iid 
sequence {Xj, X[, X"}^ from a non-atomic (continuous) distribution function F. 
Then, for each n\,n% G {1, 2, . . .}, 

(K +nV K+n 2 ) = (9(K +ni ),9(W: + J), (25) 

where g(u) = F _1 (l - e~"), u > 0, with F^ 1 (y) = inf{x : F(x) >y},yE (0, 1). 

With the help of Lemma I4.1[ Theorem 12.11 yields the following characterization. 

Theorem 4.1. If (R' n+ , R'^ +n2 ) are splitting records based on an iid sequence 
^ from a non-at 

then 



{X i \°l 1 from a non-atomic distribution F with ~E(R' n+ni ) 2 < 00 and E(i?" +ri2 ) 2 < 00 



Corv (R n+nv R n+n2 ) < 



n 



sjn + niy/n + n 2 ' 

and the equality holds if and only if F is the distribution function of aE + /3 for 
some a > and /3 G R, where i£ ~ £xp(l). 

Proof: Set X = £?! + • + + + • ■■ + E' m and F = + • • - + E n + E'{ + - ■ - + E^ 
with (Ei, . . . , -E" 2 ) being a vector of n + ni + n 2 iid standard exponential rv's. It can 
be seen (see the proof of Theorem 14.21 below) that for all k G {1, 2, . . .}, 

E(x fc |F) = [n]fc y fc + Poi fc _i(r), E(r fc |x) = Wfc x fc + Poi fc _ x (x). 

[n + n 2 j fe [n + rulk 
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That is, the random vector (X, Y) has the polynomial regression property with 
A k = [n] k /[n + n 2 }k and B k = [n] k /{n + ni\ k . Clearly, p\ = {[n] k ) 2 / \[n + n 1 } k [n + n 2 } k ) 
is strictly decreasing in k. In view of Lemma I4.1[ Theorem 12.11 shows that, with 
g(u) = F~ 1 (l-e- u ), 

Corr« +ni) < +n2 ) = CoTT(g(W' n+ni ),g(W:: +n2 )) 



Corr(g(X),g(Y))< 



n 



y/n + niy/n + n 2 



and the equality holds if and only if g : (0, oo) — > R is linear; this, together with the 
fact that g has assumed to be strictly increasing, completes the proof. □ 

Theorem 14.11 and Nevzorov's (1992) characterization reflects the polynomial re- 
gression property of a specific class of multivariate Gamma random vectors, provided 
that every component is representative as a sum on independent Gamma rv's with 
the same scale parameter, say 1/A. Recall that a random variable X follows a 
Gamma distribution with parameters a > and A > if its density is given by 

f(x) = -^x a - 1 e- x >, x>0; 

r(«) 

this fact is denoted by X ~ r(a; A), while the notation X ~ r(0; A) (for some A > 0) 
means that X is degenerate and takes the value zero w.p. 1. In any case, EX = a/A 
and VarX = a/A 2 . Under the above notation one can easily verify the following 
result, which essentially contains both Theorem 14 . 1 1 and Nevzorov's characterization 
as particular cases. 

Theorem 4.2. Let Xj ~ T(af,X) (i = 0,1,2) be independent rv's with A > 0, 
Oj > (i = 0, 1, 2) and «o + «i > (i = 1,2). Then the random vector (X, Y) = 
(Xo + Xi, Xo + X 2 ) follows a bivariate distribution with Gamma marginals, namely 
X ~ r(a + Oi\] A) and Y ~ r(a + 0! 2 ; A). Moreover, (X,Y) has the polynomial 
regression property: For all n G {1, 2, . . .}, 

where [a]o = 1 for all a G R and = a(a + l) • • • (ot+fc — 1) (fc = 1, 2, . . .). Finally, 
for any 5(1 G L 2 (X) with Var^(X) > and for any g 2 G L 2 (Y) with Varp 2 (^) > 
we have the inequality 

Corx( gi (X),g 2 (Y))< 



where, provided that oti + a 2 > 0, the equality holds if and only if either a = 
(and gx, g 2 are arbitrary) or a > and both g x , g 2 are nonconstant, linear and with 
the same monotonicity. 
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Proof: Cases ao = and ot\ = a<i = are trivial (X,Y are independent and X = Y 
w.p. 1, respectively). Both cases cto > 0, cti = 0, ct2 > and cto > 0, «i > 0, a 2 = 
are similar to Nevzorov's case and can be shown as in Section [3j Assume now that 
«i > for i = 0,1,2. Then, it is easily seen that the conditional density of X given 
Y = y (for any fixed y > 0) is given by 

rmin{x,y} 

f x]Y (x\y) = ce~ Xx / w a °- 1 (x - w) ai -\y - w^^e^dw, x > 0, 

Jo 

where 

A Qi r(« + « 2 ) 

c = c(ao,ai,«2; A;y) - 



y a +« a -ir(a )r(ai)r(a 2 ) ' 
Despite the fact that this conditional density is not given in a closed form, we can 
calculate E(X"|y = y) using Tonelli's Theorem. Indeed, consider the nonnegative 
functions 6{w) = iu ao ~ 1 e Au ' ^ w > anc j h(x,y,w) — (x — w) ai ~ 1 (y — w) Q2_1 /(w < 
min{x, y}) (x,y,w > 0). Then, 

E(X n \Y = y) = c {^ x " e ~ AX ^ 9(w)h(x,y,w)dwdx 

+ / x n e~ Xx / 9(w)h(x,y,w)dwdx 



y Jo 
6{w) I x n e~ Xx h(x,y,w)dxdw 



y ry 







+ 



/y r°a \ 

9(w) / x n e~~ Xx h(x,y,w)dxdw> 



6(w) / x n e h(x } y } w)dxdw 

J w 

y 



Now, expanding (x+w) n according to Newton's formula and using J °° x- ?+ai_1 e _Ax (ix 
r(«i + j) I A ai+J (j = 0, 1, . . . , n) we get for the inner integral the expression 



(x + w) n e- Xx x a '- l dx = ^ Qj 



j=0 

Finally, substituting this expression to the double integral, above, we get 



i=o 

c 



f M r(ai + j) r(« 2 )r(« + n - j) +a2+(n _ ?) _ 1 



i=o 

r(a + Qf 2 ) / n\ T(a + j)T(ai + n — j 



r(ao)r(ai) ^ U'/ A n -JT(a + "2 + j) 



-y 3 . 
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Therefore, X has polynomial regression on Y and, similarly, Y has polynomial re- 
gression on X. It follows that (X, Y) satisfies conditions A1-A3 and, moreover, 

[«o]n 



Pr, 



sign(v4 n ) V A n B n 



^J[a + ai] nV /[a + o> 2 ]n 
since \p n \ = p n is strictly decreasing in n, an application of Theorem 12.11 completes 
the proof. □ 



Theorem 14.21 includes Nevzorov's (1992) characterization because, taking A = 1, 
cto = n, a\ = 0, a 2 = m and gi(u) = ^(w) = F~ l (l — e~ u ), u > 0, we have that, 

under the standard record model, (R n ,R n+m ) = (g(W n ), g(W n+m )) = (g(X),g(Y)), 
where (W n , W n+m ) are the corresponding upper records from the standard exponen- 
tial distribution. Clearly, it also includes the result on splitting record models of 
Theorem 14.11 - the only difference being that, due to Lemma |4~TI one has now to put 
di — n i (rather than a.\ = 0) and a 2 = n 2 (rather than a 2 = m). 

Also, it is of some interest to note that ( Till yields the covariance identity (cf. 
Afendras et al. (2011)) 

Cov[ 9l (X),g 2 (Y)] = [a °\ E[X"g<r\x)]E[Y n gi n \Y)], (26) 

^ n\[a + aij„[a + OL 2 \ n 

provided that g\,g 2 G C°°(0,oo), g\{X) G L 2 (X), g 2 (Y) G L 2 (Y), and assuming 
that E|A n ^i n) (A)| < oo and E\Y n g^ l \Y)\} < oo for all n, where g^ denotes the 
n-th derivative of g^ i — 1, 2. Of course one can apply (126]) to the case a,\ = a 2 = 0, 
«o > 0; then, X = Y ~ r(«o; A) and we get the (known) generalized Stein-type 
identity for the r(a ; A) distribution: 

oo 1 

Cov[ 9l (X),g 2 (X)] = Ys^TT- nX n g { r\x)mX n g { 2 n) {X)}. (27) 

Also, we can apply f j26|) to the classical record setup from the standard exponential 
(on taking A = 1, q;o — n, a% — and a 2 — m); then we get the identity 

Cov[ gi (W n ),g 2 (W n+m )} = J2 T i nW k J*\w n )}nW k n+m gf\w n+m )}. (28) 

fc=1 K! L n + m \k 



5 Conclusions 

It is clear that the simplicity of the proposed method depends heavily on the polyno- 
mial regression property, A3, which is satisfied by all bivariate distributions discussed 
in the present article. Castano-Martmez et al. (2007) developed a correlation model 
for partial minima (or maxima) rather than records. Their Section 3 indicates that 
many difficulties can enter to the correlation problem when A3 fails; it seems that, 
in such cases, one has to calculate the values of p n ^ = F,[<fi n (X)i/j k (Y)] for all n and 
k. This is not an easy task in general, in contrast to the present simplified case, 
where knowledge of the values A n and B n in A3 suffices for calculating the maximal 
correlation coefficient. 
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